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Abstract 

A  quad-tree-like  data  structure  can  be  implemented  with  a 
content-addressable  cache.  The  resulting  structure  captures  spatial 
relations  at  several  resolutions.  Spatial  relations  such  as  containment 
and  contiguity  may  be  useful  in  flushing  Cache-Hough  Transform 
accumulators.  Algorithms  for  accumulator  cache  management  may  be 
written  that  are  related  to  some  proposed  in  the  literature  for 
statistical  mode  estimation. 
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I.  Background  and  Overview 

The  Hough  Transform  (HT)  is  a  parameter  estimation  strategy  based  on  the 
statistical  mode  [Duda  and  Hart  1972;  Ballard  1981].  More  common  strategies  such 
as  least  squared  error  fitting  (e.g.  linear  regression)  are  based  on  the  statistical  mean. 
The  HT  has  achieved  engineering  importance  in  several  areas  of  image 
understanding.  It  is  an  efficient  implementation  of  generalized  matched  filter 
(template  matching)  detection,  and  its  mode-based  nature  makes  it  highly  noise- 
resistant.  Outliers  do  not  affect  it,  whereas  they  always  have  some  effect  on  simple 
regression  (of  course,  much  research  in  statistics  is  concerned  with  rejecting  outliers). 
HT  is  an  important  technique  in  some  massively  parallel  computing  architectures 
[Feldman  and  Ballard  1982], 

In  the  HT,  features  (in  the  transform  space)  produce  votes  for  parameters  with 
which  the  features  are  compatible.  After  the  voting  process,  the  cell  with  the  most 
accumulated  votes  indicates  the  parameters  explaining  (consistent  with)  the  most 
input  evidence.  HT  implementations  usually  use  an  N-dimensional  array  that 
accumulates  votes  in  discrete  cells  of  N-dimensional  parameter  space.  Then  HT 
interpretation  usually  consists  of  finding  the  winning  ceil  (or  cells)  by  searching  the 
array  for  local  or  global  maxima  using  more  or  less  complex  algorithms.  If  the  array 
is  considered  as  a  histogram,  then  the  maxima  (peaks)  correspond  to  its  modes. 

In  an  effort  to  avoid  the  space  requirements  inherent  in  an  N  dimensional 
accumulator  array,  we  at  the  University  of  Rochester  [Brown  and  Sher  1982;  Brown 
1983]  proposed  to  implement  vote  accumulation  and  peak  finding  (mode  estimation) 
in  HT  with  a  content-addressable  cache  (a  hash-table  is  a  software  equivalent).  The 
cache  is  smaller  than  the  full  accumulator  array,  but  it  must  be  flushed  when  its 
capacity  is  filled  to  make  room  for  more  voles.  The  hope  is  that  in  a  space  sparsely 
filled  with  votes  the  cache  reliably  finds  the  mode  using  much  less  storage  then  the 
array.  Preliminary  experiments  with  a  simulated  cache  and  both  simulated  and  real 
data  were  undertaken,  and  are  reported  in  [Brown  and  Sher  1982],  and  summarized 
in  Section  3.4. 

I  here  propose  a  cache-flushing  'technique  and  associated  architecture  that  may. 
Significantly  improve  the  performance  df  cache- based  sequential  mode-estimation 
schemes.  The  basic  strategy  is  inspired  by  an  iterative  mode-estimation  scheme  in  the 
statistical  literature  [Robertson  and  Cryer  1974],  which  uses  spatial  contiguity 
(intervals  on  the  real  line).  The  Robertson  and  Cryer  algorithm  is  similar  to  the 
"converging  squares"  mode-finding  algorithm  [O’Gorman  et  al.  1983;  O’Gorman 
and  Sanderson  1983].  The new;  flushing  algorithm  takes  into  account  contiguity  m 
parameter  space;  the  old  flushing  algorithms  did  (indeed,  could)  not.  Th.;  new 
scheme  needs'  more  cache  complexify  than  the  old,  and  its-  logic  is  more  complex! 
However,  It  seems  the  added  complexities  are  pot .  ruinous,  and  are  perhaps 
amenable  to  hardware  implementation.  -The  new  scheme  generalizes  most  mode- 
finding  algorithms  in  that  it  can  find  multiple  modes.. 

-  Section  2  outlines  the  prbblcm  and  the  proposed  solution.  Section  3  is  a  list  of 
ideas  that  are  more  or  less  related,  and-  an  assessment  of  their  -utility.  Section  4 
explores  the  proposed  scheme  iii  more  detail.  Some  technical  details  appear  in 
Section  5. 


2.  The  Problem  and  Proposed  Approach 

This  section  is  meant  to  provide  context  for  the  basically  bottom-up  organization 
of  the  remainder  of  this  report.  One  mode-estimation  strategy  takes  samples  from  a 
one-dimensional  density  function,  sorts  them,  and. then  iteratively  constructs  ever 
smaller  intervals,  each  the  smallest  in  the  last  containing  some  number  of  samples 
determined  by  the  sample  size  [Robertson  and  Cryer  1974,  and  Section  33).  I  his 
basic  converging  strategy  easily  extends  to  many  dimensions,  though  this  may  affect 
the  technical  results  on  convergence,  consistency,  etc.  The  iterative  convergence 
behavior  is  achieved  also  by  the  converging  squares  technique  of  [O’Gorman  and 
Sanderson  1983]. 

A  convergence  algorithm  for  sequential  sampling  was  proposed  by  Mall  [1982], 
The  work  in  this  report  is  an  implementation  inspired  by  Mall’s  suggested  approach, 
filtered  through  some  current  computer  science  ideas  and  existing  implementations, 
such  as  Cache-based  implementations  of  the  Hough  transform. 

A  cache-based  Hough  scheme  [Brown  and  Sher  1982;  Brown,  Curtis,  and  Sher 
1983;  Section  3.4]  maintains  a  content-addressable  cache  of  vote  tallies  for  parameter 
vectors.  Content-addressability  means  that  geometric  relations  (e.g.  contiguity) 
between  vectors  are  not  necessarily  mirrored  in  the  cache  data  structure.  The 
algorithm  of  Robertson  and  Cryer  thus  differs  from  current  Cache-Hough  schemes 
in  two  important  ways  (and  some  unimportant  ones  such  as  requiring  absolutely 
continuous  pdfs  for  convergence  proofs). 

1)  Their  algorithm  assumes  a  fixed  sample  size,  and  that  all  the 
sampling  is  completed  before  it  runs. 

2)  It  is  based  on  spatial  contiguity  (intervals),  over  which  it 
constructs  density  measurements. 

In  both  these  respects  the  convergence  algorithm  has  an  advantage  over  Cache- 
Hough,  which  must  deal  with  samples  collected  sequentially  and  which  (so  far)  has 
no  notion  of  geometric  contiguity  and  hence  of  vole  "density"  over  a  finite  area.  The 
first  difference  is  fundamental.  It  is  the  purpose  of  this  report.to  address  the  second 
difference  and  endow  Cache-Hough  with-  a  flushing  strategy-guided  by  geometrical 
contiguity.  The  resulting  •  cache-management  algorithms  resemble  existing 
convergence  algorithms  to  an  extent,  especially  as  regards  the  use  of  interval; 
converging  on  the  mode.  They  are  much  closer  to  the  suggestion  of  Hall  [1982], 

The  idea  is  to  use  a  cache  versiomof  quad  (oct,...2^). trees. to  guide  flushing  of  the 
highest-resolution  tally  cache'  (MRC).  In  tHe  lower  resolution,  caches  (I.RCs),  each 
tally  entry  corresponds  to  a  2_x  .2  x  „.tx.-2,  d-dimensional  hypercube  of  the  next 
higher-resolution  cells  or  parameter  vectors,  and  contains  the  total  number  of  votes 
in  the  corresponding  higher-resolution  hypercube.  In  the  cache  version,  only  cells 
and  vectors  with  non-zero  counts  are  explicitly  represented.  Since  the  vector 
components  are  quantized  and  thus  discrete,  1  shall  usually  refer  to  vector  (HRC) 
entries  as  cells.  The  hope  is  that  flushing  (and  perhaps  thereafter  barring)  voles  from 
contiguous  hypervolumes  of  low  voting  strength  renders  mode-capturing  more 
robust.  (This  hope  is  so  far  unsubstantiated  by  experiment,  and  arises  from  intuition 
only.) 
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In  the  proposed  scheme,  a  cache  version  of  a  2^  tree  is  constructed  as  votes  come 
in  to  the  HRC.  The  LRCs  implementing  lower-resolution  levels  are  updated  to  be 
consistent  with  the  HRC.  Flushing  is  triggered  by  a  full  HRC,  but  emanates  from 
some  LRC,  say  1'lushCell,  whose  vote  count  is  low,  say  F'lushCount.  Vote  counts  in 
LRCs  of  lower  resolution  than  FlushCell  are  decremented  by  F'lushCount,  and 
higher-resolution  cells  in  FlushCell’s  hypervolume  are  completely  removed  from  the 
cache.  Thus  the  flush  proceeds  in  both  directions,  decrementing  in  the  lower- 
resolution  direction  and  removing  in  the  higher-resolution  direction. 

Current  Cache-Hough  schemes  only  have  the  HRC,  and  flush  on  the  basis  of  low 
individual  tallies  (optionally  using  random  mercy  to  preserve  a  fraction  of  low 
tallies).  Single-resolution,  content-addressable  caches  are  limited  to  such  "pointwise" 
flushing  strategies.  The  hierarchical  cache  captures  geometrical  structure  in  the  data 
for  use  by  cache-maintenance  algorithms.  It  provides  some  concept  of  vote  density, 
can  support  a  version  of  accumulator  space  smoothing,  and  can  implement  a  flushing 
algorithm  that  captures  the  advantages  of  convergence  schemes  without  accepting  ail 
their  limitations.  The  known  convergence  algorithms  concentrate  on  finding  a  single 
peak.  This  algorithm  penalizes  low  counts  but  allows  for  multiple  peaks  to  survive  in 
the  cache,  thus  implementing  multiple  mode-finding. 

3.  Related  Work 

The  ideas  in  Section  2  suggest  various  results  from  statistics  and  computer 
science.  This  section  mentions  several,  and  passes  judgement  on  their  utility  in  this 
context.  Hriefly  we  conclude  the  following. 

1)  Convergence  algorithms  for  mode  estimation  seem  at  this 
writing  to  be  the  most  directly  relevant  statistical  methods, 
though  they  are  most  at  home  in  unimodal  situations.  Other 
problems,  such  as  PDF  estimation  and  the  maximum  of  a 
sequence  problem,  are  not  as  relevant. 

2)  No  technical  results  about  the  exact  problem  of  mode 
estimation  with  finite  memory  and  a  discrete  sample  space  have 
been  found  in  the  literature. 

3)  Cache  Hough  methods  have  promising  performance,  but  suffer 
from  severe  myopia  when  flushing.  The  incorporation  of 
geometrical  contiguity  information  may  help. 

4)  'Ihe  usual  management  of  multi-resolution  data  structures  (such 
as  Quad  (oct...)  trees,  Dynamically  Quantized  (DQ)  spaces,  and,. 

DQ  pyramids)  is  not  suited  for  this  application. 

3.1  Nonparametric  Multivariate  PDF  Fstimation 

One  natural  idea  is  to  estimate  not  just  the  mode  ©f  the  histogram  embodied  in 
the  accumulator  array,  but  to  estimate  the  corresponding  PDF  [Wegman  1972, .1982). 
PDF  estimation  is  inherently  harder  than  mode  estimation,  because  there  is’ more 
information  to  be  gleaned  from  the  same  input.  In  fact,  some  PDF'  estimation 
assumes  the  mode  is  known.  N-dimensional  (i.e.,  multivariate)  PDF’  estimation  is 
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thought  in  the  statistical  community  to  be  quite  difficult  and  to  require  many 
samples  because  N-dimensional  spaces  are  (hyper)voluminous.  Another  difference  is 
that  usually  a  PDF  is  to  be  estimated  from  a  fixed-size,  completely-gathered  sample, 
rather  than  from  the  sequential  samples  that  arise  in  cache  methods,  because  of  the 
greater  inherent  difficulty  and  the  seeming  lack  of  relevant  methods,  non  parametric 
PDF  estimation  does  not  seem  to  be  an  attractive  alternative  to  mode  finding. 

Often  data  is  neither  purely  parametric  nor  completely  non-parametric,  and 
partial  information  about  the  form  of  the  underlying  PDF  can  be  used  to  advantage 
[Sager  1983].  Translating  to  HT  terms,  Sager  advocates  ordering  N-dimensional  cells 
by  their  counts  and  then  applying  conventional  (one-dimensional)  rank  statistics.  In 
other  words,  he  would  use  vote  count  contours  in  n  dimensions  to  help  describe  the 
PDF.  'This  suggestion  appeals  to  me  because  sometimes  we  know  the  shape  of  the 
vote  count  surface  [brown  1983]  and  it  could  be  used  to  help  locate  peaks.  However, 
again,  the  resulting  algorithm  is  not  for  sequential  data.. 

3.2  The  Maximum  of  a  Sequence 

The  sequential  nature  of  cache-based  mode  finding  leads  one  to  associate  it  with 
the  Maximum  of  a  Sequence  (MaxSeq,  Secretary,  beauty  Contest,  etc.)  Problem,  't  he 
pure  version  of  MaxSeq  is:  you  are  presented  n  face-down  slips  of  paper,  on  each  of 
which  is  written  an  integer,  and  you  are  to  turn  them  up  sequentially  until  you 
decide  that  the  current  slip  has  the  maximum  integer.  What  strategy  should  you  use 
to  maximize  the  probabilty  of  choosing  the  slip  with  the  maximum  integer?  It  is 
surprising  that  even  under  such  seemingly  underconstrained  conditions  an  optimal 
strategy  exists  and  gives  you  the  respectable  and  elegant  winning  probability  of  1/e. 
MaxSeq  has  been  extended  in  several  ways,  including  allowing  a  finite  buffer  of 
candidates,  searching  costs,  call  backs,  and  so  forth  [Gilbert  and  Mosteller  1966; 
Smith  and  Deely  1975;  Ixarenzen  1979,  1981]. 

Loui  [1983]  investigated  the  application  of  these  results  to  cache-based  mode¬ 
finding:  the  conclusion  is  that  MaxSeq  is  a  different  problem.  In  HT  mode-finding 
terms,  MaxSeq  assumes  that  ail  the  votes  are  in  and  saved,  that  the  final  tallies  are 
sequentially  presented,  and  that  there  is  no  recalling  a  previously  dismissed  tally.  The 
non-recall  constraint  is  especially  stringent  and  non-intuilive  in  the  HT  context,  and 
again  there  is  the  basic  difference  that  in  cacheing  the  votes  come  in  sequentially, 
whereas  in  MaxSeq  the  totals  are  in  and  they  (not  individual  votes)  are  presented 
sequentially. 

3.3  Mode  Estimation 

A  common  method  of  estimating  the  mode  of  a  continuous  unimodal 
distribution  from  n  samples  is  (in  1-D)  to  sort  them  and  then  find  the  shortest 
interval  containing  some  number  h(n)  of  them.  Then  the  mode  is  taken  to  be  some 
point  in  that  interval  (e.g.  its  midpoint,  or  the  mean  or  median  of  the  samples  it 
contains)  [Venter  1967].  The  convergence  scheme  of  Robertson  and  Cryer  is  designed 
to  lend  robustness  to  finding  the  mode  of  “contaminated"  distributions.  It  refines  the 
interval  iteratively,  at  each  stage  finding  the  shortest  subinlerval  containing  c(l\n) 
samples,  for  t  =  l,2,...s(n)  stages  [Robertson  and  Cryer  1974].  Thus  the  interval  is  cut 


down  by  a  fraction  of  [(c(^)(n)-cW(n)]/[c^ *^(n)]  on  iteration  t.  In  all  these  cases, 
the  technical  statistical  results  have  to  do  with  choices  of  h(n),  c(n),  and  s(n)  that 
yield  consistent  and  quick  convergence,  and  with  asymptotic  distributions. 
Kxperiinental  results  of  Robertson  and  Cryer  using  a  c(n)  of  approximately  2n/3  to 
3n/4  (here  ti  is  the  number  of  samples  in  the  interval  being  refined,  not  the  total 
number  of  samples),  indicate  that  with  outliers  (noise)  or  contaminated  data 
(multimodal  or  in  their  case  a  mixture  of  two  distributions)  the  intervals  must  start 
smaller  and  converge  faster  to  avoid  getting  confused  by  local  maxima.  They 
recommend  that  c(n)/n  be  significantly  smaller  than  1-p,  p  the  fraction  of 
contaminated  data. 

The  mathematical  restrictions  on  these  methods  that  must  be  imposed  to  allow 
analytic  results  are  fairly  severe,  but  the  strategies  are  clear  and  appealing  and  extend 
to  multiple  dimensions.  They  inspired  the  approach  proposed  in  this  paper. 

Finding  the  shortest  interval  containing  a  given  number  of  samples  requires 
search.  In  statistical  models,  the  samples  are  from  a  continuum,  and  hence  will  be 
duplicated  only  with  probability  zero.  In  the  accumulator  array  search,  the  samples 
are  discrete,  and  the  cells  can  have  more  than  a  unit  count.  Thus  the  search  will  have 
to  keep  an  updated  sum  of  counts  in  a  (d-dimensional)  rectangular  volume,  which 
requires  slightly  more  computational  effort  than  merely  counting  single  samples. 
Fast  techniques  exist  for  running  rectangular  averages,  however. 

The  iterative  search  of  convergence  methods  is  not  more  expensive  than  one-time 
search.  To  compare  Robertson  and  Cryer’s  approach  to  that  of  Venter,  both  presume 
n  1-D  sorted  real  data.  The  search  for  the  smallest  interval  containing  h(n)  =  c(l)(n) 
of  them  requires  comparing  the  lengths  of  n  -  h(n)  -f  1  intervals  (thus  n-h(n) 
comparisons).  Robertson  and  Cryer  point  out  that  in  the  iterative  scheme,  the 
number  of  intervals  to  compare  is 

t 

£  [  c^"^(n)  -  c(‘)(n)]  =  n-c'W(n)  =  n-h(n). 

i  =  1 

If  the  fraction  by  which  the  interval  is  diminished  on  each  iteration  is  constant,  write 
it  as  r.  Then  the  above  result  is  derivable  from  geometric  series  summation,  by 
which  it  generalizes  to  d  dimensions.  Surprisingly,  the  iterative  search  in  d 
dimensions  takes  fewer  comparisons  than  the  one  time  search  in  d  dimensions. 
(Remember,  this  is  the  number  of  hypervolume  densities  (or  total  occupancies)  that 
must  be  computed  to  find  the  densest  interval.  It  does  not  include  the  number  of 
operations  needed  to  compute  each  total.  I;or  that  operation,  fast  running-total 
algorithms  exist  [Rosenfeld  and  Thurston  1971;  Narendra  1978].)  In  the  one  time 
search,  the  number  of  hypervolume  densities  in  d  dimensional  MxMx...M  space  that 

must  be  computed  to  find  the  densest  hxh...xh  hypervolume,  h  =  rlM  is 
Ci  =  (M-rlM)d  =  Md  (l-rl)d 

(Here  t  is  an  honest  exponent,  not  an  index.)  In  an  iterative  search,  the  number  of 
hypervolumes  to  be  considered  is 


C2  =  (M-rM)d  +  (rM  -  r2M)d  +  (r2M  -r3M)d+  ... 

=  (l-r)d(Md  +  rdMd  +  ...  +  r(H)dMd) 

=  Md(l-r)d  (1  +  rd  +  r2d  +  ...  -t-r(ll)d) 

=  Md  (l-r)d(l-rtd)  /  (l-rd). 

The  ratio  C2  /  C]  is  the  fraction  of  comparisons  that  the  iterative  search  must 
make  compared  to  the  one-time  search.  The  behavior  of  this  ratio  is  not  obvious 
from  Lhe  formulae,  although  for  small  r  it  is  approximated  (from  below)  by 

(l-dr)y(ldr1). 

Table  1  gives  values  of  the  ratio  for  relevant  r,  d,  and  t.  It  shows  that  as  dimension, 
the  sue  of  the  ratio,  and  the  number  of  iterations  go  up,  the  number  of  density 
comparisons  falls  off. 
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0.039 
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0.004 

0.013 

0.000 

0.000 

r:  .25 

r:  .5 

r:  .75 

Table  1:  The  fraction  of  hypervolumes  that  must  be  compared  in  an 
iterative  search  for  the  densest,  compared  to  a  non-iterative  search. 

In  the  converging  squares  algorithm  [O’Gorman  el  al.  1983;  O’Gorman  and 
Sanderson  1983],  a  space  of  size  nxn  (in  two  dimensions)  yields  four  smaller 
overlapping  spaces.  In  2 -D,  the  new  spaces  are  the  (k-1)  x  (k-1)  squares  in  the  four 
corners  of  the  old  space.  The  single  (k-1)  x  (k-1)  square  of  maximum  density  is 
chosen  for  expansion  at  the  next  cycle.  The  common  area  between  the  overlapping 
spaces  allows  it  to  be  disregarded  in  computing  the  differences  in  density,  resulting 
in  substantial  computational  savings. 

Comparing  the  computations  needed  for  converging  squares  to  two  other  simple 
mode-finding  algorithms: 

--  maximum  value:  C(n2-1) 

--  smoothing  (four  points)  then  maximum  value:  3An2  +  C(n2-1) 

--  converging  squares:  A(n2  +  7n-22)  +  C(5n-7) 

where  C  is  a  conditional  operation,  A  is  an  addition  operation.  Typically  a  C  lakes 
twice  as  long  as  an  A,  and  an  implementation  of  converging  squares  is  in  fact  about 
three  limes  faster  than  maximum  value  and  six  times  faster  than  smoothing  on  a 
VAX  11/780. 


3.4  Cache  Hough  Implementation 

The  performance  of  Cache  Hough  schemes  under  a  variety  of  conditions  (noise, 
cache  length,  length  of  vole  bursts,  image  scanning  order,  flushing  strategy)  is  tested 
in  [Brown  and  Sher  1982].  These  experimental  studies  are  not  backed  by  formal 
analysis. 

The  cache  model  is  that  of  a  single-resolution  tally  cache  (the  HRC  of  Section  2), 
flushed  by  either  of  two  strategies.  In  "Slaughter  of  Innocents"  flushing,  all  tallies 
below  a  threshold  are  flushed.  In  "Draft  lottery"  or  "Random  Mercy"  flushing,  a 
fraction  of  all  tallies  below  a  threshold  is  selected  at  random  and  flushed.  The 
performance  of  a  cacheing  scheme  is  measured  by  the  ratio 

(votes  for  the  vector  known  to  be  correct) 

SNR3  =  . . 

(maximum  votes  for  any  incorrect  vector) 

There  are  few  qualitative  surprises  in  this  work.  Performance  improves  with 
increasing  cache  length  and  falls  off  with  increasing  noise  and  fraction  of  incorrect 
votes  in  a  vote  burst.  Scanning  strategies  seem  equally  matched  except  for  random 
with  replacement,  which  may  have  been  prejudiced  by  relatively  small  sample  size 
(400  samples  (vote  bursts)  from  a  20  x  20  array  of  features).  The  lottery  flushing 
strategy  works  better  than  the  slaughter  strategy.  F  igure  1  shows  some  sample  results. 

After  it  fills  (which  can  be  after  a  few  features  cause  vote  bursts),  the  cache  is 
continuously  flushing.  Since  the  content-addressable  cache  does  not  maintain 
contiguity  information,  a  tow  tally  from  an  active  (dense)  region  of  parameter  space 
is  as  likely  to  be  flushed  as  a  low  tally  from  an  inactive  (sparse)  area.  The  noise 
modeled  in  the  experiments  is  additive  noise  that  does  not  "spread  the  peak"  in 
parameter  space  as  does  quantization  noise.  In  fact  quantization  noise  is  important, 
and  is  sometimes  taken  to  be  the  only  important  noise  effect  [Shapiro  and  Fannino 
1979],  It  is  usually  combated  by  smoothing  the  accumulator  array  before  searching 
for  modes.  Such  contiguity -based  techniques  are  difficult  and  unnatural  using  only 
the  content-addressable  HRC,  but  become  possible  in  the  proposed  scheme,  which 
leads  to  an  "urban  renewal"  flushing  strategy  in  which  good  neighborhoods  are 
preserved.  The  analysis  of  [Brown  1983]  shows  lltat  when  multiple  votes  are 
produced  for  each  feature,  neighborhoods  of  high  voting  strength  arise  around  peaks. 
Thus  the  "urban  renewal"  strategy  offered  by  hierarchical  caches  seems  a  promising 
approach  for  all  known  voting  schemes. 
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f  igure  1:  Sample  SNR3  histograms  for  four  configurations  of  cache  HT,  showing  the 
beneficial  effects  of  increased  cache  length  and  flushing  with  random  mercy.  SNR3  > 
1  means  the  correct  parameter  vector  received  the  most  voles.  The  bimodality  of  the 
distributions  is  unexplained--the  peak  at  SNR3  =  0  represents  trials  in  which  the 
correct  vector  was  not  even  in  the  cache  after  the  HT. 


3.5  Quad  Trees  and  DQ  Methods 

Multi-resolution  approches  to  image  understanding  and  processing  have  been 
popular  and  useful  for  a  long  time  [Kelly  1971;  Warnock  1969).  Keeping  the 
multiple  resolutions  explicitly  in  a  pyramid  data  structure  has  also  proven  quite 
useful  [Samel  1980;  Tanimoto  and  Pavlidis  1975].  When  the  resolution  pyramid  is 
made  of  predefined  cells,  they  are  usually  split  symmetrically  along  each  dimension, 

and  the  resulting  structure  is  called  a  quad  (oct,...)  tree.  I  call  them  2^  trees  here. 
When  the  cells  are  split  asymmetrically,  or  the  density  of  resolution  varies  over  a 
pyramid  level,  especially  when  the  splitting  varies  as  the  contents  of  the  data 
structure  arrive  sequentially,  a  Dynamically  Quantized  (DQ)  space  or  pyramid  results 
[O’Rourke  1981;  Sloan  1981). 

A  control  program  usually  adapts  these  data  structures  to  high-resolution  data  by 
generating  new  cells  where  data  is  densest.  In  2^  trees,  this  is  done  by  splitting  the 
lower- resolution  cells.  In  DQ  spaces,  the  data  structure  is  a  k-d  tree  [ltentley  1975], 
and  cells  are  split  and  merged  as  data  arrives  sequentially.  Ir  AQ  Pyramids,  the 
number  of  cells  at  a  level  is  fixed  but  their  extent  is  vane-  y  moving  the  d- 
dimensional  "crosshairs"  that  split  each  level  into' 2^.  Usually  th  inagement  of  the 
data  structures  has  the  goal  of  producing  cells  with  equal  comp  -y,  or  numbers  of 
counts.  The  density  of  the  data  is  thus  mirrored  in  the  data  sir  re  (dense  data  in 
a  region  produces  a  tree  that  is  deeper  for  that  region,  for  instar  iis  approach  is 
natural  in  a  sense-it  does  not  require  search  if  enough  inforn  ,n  is  kept  in  the 
cells  to  allow  splitting  and  merging.  The  data  structure  and  some  ancillary 
information  is  sufficient  to  reconstruct  the  approximate  density  of  the  original  data, 
which  is  inversely  proportional  to  the  size  of  the  cells. 

DQ  structures  in  the  literature  suffer  from  a  few, difficulties.  The  cells  in  DQ 
spaces  can  be  unintuitive  (Figure  2).  The  splitting  and  merging  algorithms  are 
complex.  The  cells  contain  only  approximately  correct  counts  after  splitting  and 
merging,  operations  which  are  controlled  by  total  counts  and  by  count  gradient 
information  within  a  cell.  The  high-resolution  parameters  (locations)  of  counts  are 
lost.  DQ  Pyramids,  since  the  numbers  of  cells  is  fixed,  have  simpler  algorithms,  but 
again  produce  wrong  counts  as  the  crosshairs  are  moved  away  from  their  origirtal 
positions  by  adaptive  warping  as  data  comes  in.  Despite  all  this,  the  DQ  structures 
seem  to  be  practically  usable  for.  some  applications,  including  HT  accumulation. 

The  usual  count-  (or  complexity-)  equalizing  control  strategies  for  hierarchical 
data  structures  have  a  dual  son  of  effect  Trom  the  one  we  desire,  although  it  is 
possible  to  imagine  working  with  (i.e.,  around)  them.  In  the  cache  mode-estimation’ 
application  there  is  one  cache  entry  per  cell,  and  it  would  be  best  if  modes  were 
captured  inside  single  cells  instead  of  distibuted  across  several.  Also  the  possibility  of 
flushing  everything  but  one  cell  (at  some  level)  is  attractive.  Thus  the  data  structures 
of  dynamically  quantized  structures  are  useful,  but  the  management  algorithms  are 
inherently  difficult  and  in  any  case  can  be  modified  to  match  our  purposes  better. 
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Figure  2:  A  DQ  Space  with  two  modes  from  [O’Rourke  1981].  The  cells  in 
the  space  result  from  splitting  and  merging  as  data  arrive. 

4.  The  Data  Structure 
4.1  DQ  3d-Tree 

The  DQ  3^  Tree  is  a  DQ  2d  Tree  (Pyramid)  modified  to  be  more  useful  for  our 
purposes.  It  is  a  natural  data  structure  to  associate  with  the  convergence  mode- 
finding  algorithms.  The  idea  is  simple,  and  shown  in  Figure  3  in  the  3^  (two- 
dimensional)  case:  Construct  cells  that  contain  and  converge  on  dense  areas,  rather 
than  splitting  dense  cells.  The  search  necessary  to  create  these  structures  is  related  to 
that  analyzed  in  Section  3.3.  I  do  not  seriously  propose  this  structure  for 
implementation  since  adaptive  warping  would  raise  all  the  problems  encountered  in 
standard  DQ  Pyramids,  but  offer  an  approximate  and  presumably  simpler  version  in 
the  next  section. 


Figure  3:  The  DQ-3^  Tree.  In  two  dimensions,  the  I)Q  non-tree.  Central 
cells  contain  and  converge  upon  dense  areas  of  data.  Non-central  cells 
are  candidates  for  flushing  from  a  cache.  Multiple  modes  simply  require 
splitting  a  non-central  cell  at  some  level. 

4.2  Unphased  2^  Tree 

As  an  approximation  to  the  DQ  3^  Tree,  consider  the  Unphased  2^  Tree  (Fig.  4). 

It  is  simply  a  2C'  tree  augmented  al  each  level  by  a  phase-shifted  version  of  the  cells. 

For  definiteness,  call  the  usual  cells  the  U  cells,  and  the  shifted  ones  the  S  cells. 
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Figure  4:  The  unphased  2^  tree.  In  2T),  the  unphased  quad  tree.  T  he  kth 
layer  consists  of  2^  U  (usual  or  unshifled)  cells  and  (2^-1)^  S  (shifted) 
cells.  The  usual  cells  are  augmented  by  those  shifted  half  a  cell  size.  The 
size  of  shift  depends  on  the  level.  Their  sub-cell  inclusion  rules  are  more 
complicated  than  for  U  cells. 

T  his  compromise  allieviates  (but  does  not  cure,  unfortunately)  the  problem  of 
dense  areas  that  lie  on  predetermined  boundaries  in  the  tree,  at  a  considerable 
computational  saving  over  the  (belter)  solution  offered  by  DQ  3^  Trees.  A  vole  for 
parameter  vector  increments  the  count  of  a  sequence  of  cells,  namely  all  those  cells 
containing  the  vector.  liach  such  U  cell  may- be  addressed  by  the  highesl-order 

componentwise  bits  of  the  last.  This  produces  the  normal  2^  tree  structure,.  The 
construction  of  the  out-of-phase  cells  is  treated  in  some  detail  in  Section  5il. 

The  shifted  cells  are  some  insurance  against  splitting  a  peak  over  several  cells,  as 
is  guaranteed  to  happen  in  traditional  multi-resolution  schemes.  We  do  not  want  this 
to  happen:  it  loses  a  resolution  level.  Worse,  it  can  make  the  new  cells  (each,  with 

only  about  2'^  of  the  votes  for  the  mode)  vulnerable  to  flushing.  Thus,  both  non- 
traditional  tree  management  and  phase-shifted  cells  may  have  a  more  important 
effect  than  just  a  gain  of  resolution  in  the  context  of  cache-based  schemes. 


5.  Technical  Details 

5.1  Vector  Addresses  and  Arithmetic 

The  2^  tree  is  implemented  as  a  set  of  separate  but  communicating  caches,  one 
per  level  for  each  of  S  and  U  cells.  The  Vector  addresses  have  a  number  of  bits  that 
increases  by  d  with  each  level  of  increasing  resolution.  I  propose  to  use  a 
straightforward  translation  for  the  address  of  U  (natural)  cells  in  the  2^  tree,  based 
on  their  Cartesian  coordinates  in  d-space.  The  S  cell  locations  in  natural,  Cartesian 
coordinates  do  not  have  the  elegant  leading  bits  relation  with  their  underlying  cells, 
and  so  are  transformed  into  T  addresses  that  do.  U  and  T  addresses  must  be 
differentiated. 

The  following  discussion  relates  to  U  cells.  In  order  for  a  parameter  vector 
(address,  d-dimensional  location)  to  be  related  to  its  2^  tree  address,  it  is  represented 
as  follows.  If  x  is  a  d-veclor  of  m-bit  quantities  (m  =  log2M) 

x  =  xll  xl2  xl3  ...  xlin 

x21  x22  ...  x2m 

xdl  xd2  ...  xdm, 

where  the  xij  are  bits.  The  write  x  as  the  single  bit  string  (dm  vector) 

y  =  xll  x21  ...  xdl  x!2  x22  ...xd2  ...xdm. 

In  other  words,  read  the  above  array  of  bits  out  columnwise.  Thus  the  d  high-order 
bits  come  first,  and  last  come  the  d  low-order  hits.  We  shall  need  one  bit  to 
distinguish  U  LRC  addresses  from  T  LRC  addresses.  The  final  form  of  address  is 

address  =  {U/T)  y  . 

I. el  the  HRC  be  assigned  level  m  and  the  lowest  resolution  cache  entry  (a  single 
entry  counting  the  total  votes  in  each  cache)  have  level  0.  Then  at  level  k,  0<k<m,  x’s 
parameter  vector  (address)  is  the  bit  siring  of  length  kd  (interpreted  as  a  d-veclor  of 
k  bit  quantities) 

RightShift(y,  d(m-k)). 

Now  consider  S  cells  (Fig.  4),  which  introduce  considerable  complication.  They 
are  shifted  by  a  different  amount  on  every  layer.  The  kth  layer  of  the  2^  tree  has  2* 
U  cells  in  it.  T  hat  layer  has  (2^  T)d  S  cells  of  the  same  size.  To  generate  a  unique 
vector  address  (the  T  address)  for  the  S  cells,  I  subtract  half  their  linear  dimension 
from  their  cartesian  (U)  addresses.  (Think  of  sliding  the  3x3  S  cells  in  layer  2  of  a 
quad  tree  down  so  they  cover  the  "lower  left"  3x3  square  in  the  4x4  array  of  U  cells.) 
This  is  to  generate  a  unique  address  for  the  S  cell-all  its  members  will  now  have  T 
addresses  with  identical  leading  bits  (kd  of  them  at  level  k),  just  like  the  U  addresses. 

In  a  natural  way,  each  U  cell  on  any  level  k  has  associated  cells  on  all  other, 
levels.  They  are  the  cells  of  higher  k  whose  (hyper)volume  it  contains  and  the  cells 
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of  lower  k  that  contain  it.  The  rule  associating  U  cell  addresses  at  level  kj  with  a  cell 
address  y  at  level  k  is  the  following. 

Rl)  If  kj  >  k,  all  cells  at  level  kj  having  addresses  whose  high-order 
kd  bits  are  the  same  as  y  are  associated  with  (lie  within)  the  k- 
level  cell  at  y. 

R2)  If  k]  <  k,  the  single  cell  at  level  kj  whose  address  is  the  first 
kjd  bits  of  y  is  associated  with  (includes)  the  k-level  cell  at  y. 

S  cells  are  not  so  simply  associated  with  each  other,  since  the  amount  of  offset 
varies  from  layer  to  layer.  However,  an  S  cell  on  the  k  layer  is  made  up  of  2^  U  cells 
on  the  k  +  1st  layer,  and  it  is  this  correspondence  that  is  used  in  the  flushing  strategy. 

Figure  5  shows  the  U,  S,  and  T  coordinates  of  cells  (and  the  association  between 
U  and  S  cells)  in  a  "two-tree,"  where  d  =  1. 
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Figure  5:  U,  S,  and  T  coordinates.  The  connected  brackets  show  association 
relations  between  U  cells  and  other  U  and  S  cells.  These  1-1) 
coordinates  and  transformations  extend,  componentwise,  to  d 
dimensions.  UtoT  and  TtoU  transformations  are  given  in  the  text. 

F'igure  5  shows  that  two  (in  d  dimensions,  2^)  cells- at  level  k  4- 1  are  included  in  a 
cell  at  level  k.  The  transformation  UtoT  maps  U  addresses  at  level  k+1.  to  T 
addresses  at  level  k.  TtoU  maps  T  addresses  at  level  k  back  to  U  addresses  at  level 
k+1. 


UtoT(k,d):  Subtract  2W'k'l)  (k  leading  O’s  and  a  1)  from  the  k-t-1- 
bit  U  address.  The  leading  k  bits  are  the  T  address. 

TtoS(k.d):  Take  the  k  bib"  of  T  address,  append  to  them  the  two  (in 
d  dimensions,  2^)  possible  configurations  of  one  (d)  bil(s).  Then 
add  2^‘k'l)  (k  leading  O’s  and  a  1).  The  leading  k  +  1  bib  of  the 
resulting  two  (2^)  addresses  are  the  U  addresses. 

UtoT  and  TtoU  are  easily  extended  to  operate  on  the  linearized  dM-vectors  in 
dimensions.  Use  the  usual  truth  tables  for  addition  and  subtraction  (Fig.  6). 
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Figure  6:  Addition  and  subtraction  of  address  vectors.  Simply  a  carry-ripple 
operation  done  bitwise  (the  ith  bit  in  the  each  block  belongs  to  the  ith 
dimension’s  coordinate.)  The  carries  and  borrows  alone  are  added  and 
subtracted  since  the  leading  bib  of  the  addend  and  subtrahend  are  0. 
"Overflow"  here  resulb  from  using  illegal  operands  (trying  to  compute 
T  addresses  for  U  addresses  that  do  not  have  them,  or  vice  versa). 


A  more  elegant  scheme  for  accessing  the  shifted  cells  might  result  from  a  more 
sophisticated  coding  scheme.  Other  space-filling  addressing  schemes  are  possible  and 
are  potentially  useful.  The  Generalized  Balanced  Ternary  scheme  is  one  such,  using 
hexagonal  cells  in  2-1),  truncated  octahedra  in  3-1),  and  in  general  n+1 
permutahedra  in  n-space  [Gibson  and  I.ucas  1982).  We  hope  to  pursue  these  topics 
as  time  allows. 

5.2  Hushing  Algorithms 

Two  strategies  for  flushing  the  caches  seem  useful.  The  first  i.  called  static,  and 
seems  primarily  useful  for  uni-modal  accumulators  and  data  that  comes  in  from  a 
static  situation  (for  example,  a  random  scan  of  a  single  image).  The  second  is  called 
dynamic,  and  seems  more  suited  for  multi-modal  data  or  data  from  changing  sources 
(time-varying  images  or  raster  scans).  Both  flushes  are  initialed  by  conditions  in  the 
HRC,  usually  that  it  is  close  to  filling  up.  In  both  flushing  strategies,  S  or  U  I.RCs 
with  low  counts  are  found  and  flushed. 

To  flush  a  U  cell  at  level  k,  use  UI'l ush(ACell):  remove  all  ACell’s  entries. 
Decrement  the  count  of  the  cells  including  ACell  in  lower-k  U  cells  by  ACell’s 
count.  Remove  entries  in  higher-resolution  (higher-k)  cells  that  are  included  in 
ACell--whose  leading  address  bits  agree.  livery  lime  a  U  cell  at  level  k+ 1  is  flushed, 
decrement  its  associated  S  cell  at  level  k. 

To  flush  an  S  cell  at  level  k,  use  STIush(ACell):  remove  its  entries  and  UHush  its 
associated  U  ceils  at  level  k  +  1.  This  latter  flush  works  its  way  up  and  down  the 
hierarchy,  flushing  and  keeping  S  and  U  counts  consistent. 

Flushing  could  trigger  other  flushing,  as  lower-k  cells  may  become  flushable 
through  higher-k  flushes.  In  the  dynamic  algorithm,  flushing  is  all  that  happens.  In 
static  flushing,  information  is  recorded  about  which  cells  were  flushed,  and  no  new 
votes  in  those  areas  are  accepted.  This  can  be  implemented  several  ways,  using  filter 
registers  to  check  on  the  addresses  of  incoming  votes.  The  registers  can  contain 
acceptable  or  unacceptable  ranges  of  addresses  to  be  checked  before  votes  are 
inserted  in  the  HRC. 

5.3  Number  of  Lower  Resolution  Cells 

How  many  lower  resolution  cells  will  there  be?  We  can  easily  put  upper  and 
lower  bounds  on  their  number,  and  can  appeal  to  statistics  for  some  more  intuitions. 

If  the  HRC  is  2m  on  a  side  for  d  dimensions,  there  are  2mc^  possible  HRC  cells. 
There  are  possible  LRC  U  cells  on  level  m  l,  (2mc*-l)/(2t'  1)  LRC  U  cells, 

and  a  total  of  (2(m+  ^)^-l)/(2^-l)  potential  cells  in  all  caches.  U  cells  in  m+ 1  levels 
(down  to  the  single  cell  at  k  =  0).  With  increasing  d  the  number  of  LRC  cells 
approaches  the  number  in  the  first  LRC,  or  2(m'l)d.  For  example,  in  the  four  level 
quad  tree  with  64  HRC  cells,  there  are  85  total  U  cells  and  64  +  16  =  80  of  these 
are  in  the  HRC  and  the  first  LRC.  The  S  cells  approximately  double  the  size  of  the 

I  .RC  cache.  The  entire  set  of  U  and  S  LRCs  thus  is  at  worst  only  about  2(,n'^)  as  big 
as  the  HRC. 
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If  all  the  HRC  votes  are  in  a  small  area  (a  single  HRC  cell),  then  only  k+ 1  cells 
are  allocated  in  the  U  cache,  and  k  in  the  S  cache,  for  a  minimum  of  some  2k  cells. 

We  may  wonder  how  the  cache  will  look  between  these  two  extremes.  How 
likely  are  we  to  get  empty  cells  that  do  not  appear  in  the  caches?  This  question  is 
addressed  by  occupancy  statistics  (Johnson  and  Kotz  1977]  and  urn  models  (Cohen 

1978].  Say  we  have  C  cells  and  v  votes,  then  there  are  Cv  ways  to  vote  into  the  cells. 
Suppose  we  wish  to  count  the  number  of  ways  to  vote  into  C  cells  so  that  exactly  C  - 
p  of  them  are  empty  (p  of  them  contain  votes),  For  p  chosen  cells,  the  number  of 
distinct  ways  to  vote  is 

Pi  {v,p} 

where  {v.p}  is  the  Stirling  number  of  the  second  kind  (see  below).  There  are  (C,p) 
ways  to  choose  the  p  full  cells,  where  (C,p)  is  the  binomial  coefficient  (see  below). 
Thus  the  fraction  of  voting  trials  in  which  exactly  p  cells  is  filled  is 

p!  (C,p)  { v,p } 

F  =  . . 

Cv 

This  quantity  may  be  interpreted  as  a  probability  if  each  cell  is  equally  likely  to 
receive  votes'.  If  X  is  the  number  of  occupied  cells,  Pr[X  =  p]  =  F  is  known  as  the 
classic  occupancy  distribution.  If  cells  are  not  equally  likely  to  receive  voles,  the 
expression  becomes  extremely  complex  (Johnson  and  Kotz  L977,  eq.  3.5].  The  equal- 
probability  situation  minimizes  the  expected  number  of  empty  cells,  and  so  is  a  worst 
case  [ibid]. 

The  binomial  coefficient  (C,p)  is  C!  /  (C-p)!  p! 

The  Stirling  number  of  the  second  kind  {v.p}  counts  the  number  of  ways  of 
partitioning  a  set  of  v  elements  into  exactly  p  subsets,  none  empty.  We  have 

{ v.O}  =  0,  {v,l}  =  1,  (v,2J  =  2(P’1>  -1,  [v.v-l]  =  (v,2),  {v,v}  =  1,  ’ 

and  the  recurrence 

{v.p}  =  p{v-l,p}  +  {v-l,p-l}, 

which  leads  to  a  Pascal’s  triangle  like  construction.  . 

p  1  2  3.4  5  6 

v 

11 
2  1  1 

3  13  1 

4  17  6  1 

5  1  15  25  10  1 

6  1  31  90  65  15  1 
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There  is  a  helpful  identity  for  computing  occupancy  numbers: 
p!  {v,p}  =  (p.p)pv  -  (p,p  l)(p  l)v  +  (P.P‘2)(p-2)v  - ... 

As  an  example,  the  distribution  of  full  bins  after  8  voles  into  8  bins  (there  are 
2^4  possibilities)  is  approximately  (in  percent) 

Occupied  Bins  1  2345678 

%  of  trials  .0 ...  .04  2  17  42  32  6  .2 

Occupancy  distributions  have  been  thoroughly  studied  in  the  literature  (Johnson 
and  Kota  1977].  They  would  be  expected  to  occur  in  studies  of  cacheing,  and  in  fact 
were  used  to  formalize  the  behavior  of  working  sets  [Denning  and  Schwartz  1972], 
l;or  the  classical  occupancy  distribution  of  interest  here,  a  normal  approximation  is 
quite  good  [Vantilborgh  1974}. 

Certain  limit  theorems  are  known  for  the  classical  occupancy  distribution.  For  a 
fixed  number  of  votes  v,  as  the  number  of  cells  p  goes  to  infinity,  the  expected 
number  of  empty  cells  ,  .es  to  infinity,  the  expected  number  of  cells  with  one  vote 
goes  to  v,  and  the  expected  number  of  cells  with  more  than  one  vole  goes  to  zero. 
As  v  and  p  both  go  to  infinity,  with  pe(‘v/P)  ->  w,  then  the  probability  that  there  are 
t  empty  cells  approaches  a  limit 

lim  Pr[MQ  =  t]  =  wl/ewl! 

v-»oo 

Also,  if  v  and  p  go  to  infinity,  with  v/p  -»  w  <  co,  then  the  limit  standardized 
distribution  of  Mr  (the  number  of  cells  with  r  votes)  is  unit  normal,  and 

i:[Mr]  =  p(v,r)p'r  (l-l/p)(v'r)  ~  wr/(ewr!) 

var(Mr)/  E[Mr]  ~  1  -  wr  [1  +  ((r-w)^)/w]/(r!ew) 

The  last  two  paragraphs  deal  with  limits  of  expected  values  in  occupancy 
distributions,  not  with  the  distributions  themselves.  The  results  in  this  section  may 
be  useful  in  calculating  expected  cache  occupancy,  or  at  least  in  lending  some  basis 
for  order  of  magnitude  calculations  should  they  be  desired.  It  appears  that  as  a 
practical  matter,  the  allocation  of  adequate  space  for  the  I.RC  caches  might  pay  for 
itself  in  simplification  of  the  management  algorithms. 

6.  Conclusions  and  Future  Work 

Real  HT  data  includes  the  effects  of  quantization  error  as  well  as  inherent 
sidelobes  surrounding  peaks.  In  practice,  accumulator  arrays  are  usually  smoothed  to 
gather  local  evidence  into  a  point.  With  only  a  few  modes  in  the  accumulator  array, 
large  volumes  of  it  will  be  subject  only  to  votes  from  noise.  In  traditional  cache- 1  IT, 
spatial  contiguity  is  lost,  and  the  above  observations  do  us  no  good.  A  vole  flushing 
and  filtering  strategy  that  makes  use  of  spatial  contiguity  seems  likely  to  improve 
cache-HT  performance,  and  this  report  proposes  an  architecture  and  algorithms  for 
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that  purpose.  The  strategy  is  based  on  statistical  mode-estimation  algorithms,  and  the 
data  structure  is  an  augmented  version  of  quad  (oct,...2^)  trees.  The  management  of 
the  data  structure  differs  from  the  usual  in  that  the  goal  is  to  keep  votes  in  the  fewest 
cells  possible,  rather  than  to  spread  them  out  evenly  between  cells. 

For  a  small  investment  in  space,  a  hierarchical  data  structure  that  keeps  track  of 
geometric  contiguity  can  be  implemented  in  a  cache  environment,  with  vector 
addresses  encoding  the  inclusion  relations  between  multi-resolution  cells.  The 
flushing  algorithms  for  this  structure  are  simple.  Matters  become  more  complicated 
when  an  ancillary  data  structure  of  shifted  cells  is  added  to  cope  with  phasing 
problems  (peaks  being  split  across  predetermined  cell  boundaries).  Some  aspects  of 
the  resulting  structure  are  subject  to  analytic  treatment. 

One  desirable  analytic  problem  that  might  be  feasible  is  the  treatment  of  discrete 
sample  space  mode-estimation  with  finite  memory,  and  in  particular  some  properties 
of  an  iterative  technique  such  as  the  one  proposed  here.  How  often,  say,  will  it  fail  to 
find  the  mode  of  an  analytically  tractable  distribution?  Continuous  approximations 
(say  a  continuous  version  of  the  whole  problem)  begin  to  resemble  known 
convergence  algorithms. 

Dave  Sher  implemented  a  software  simulator  for  HRC  only  caches  [Brown  and 
Sher  1982],  He  is  now  working  on  a  VLSI  implementation  of  a  content-addressable 
tally  cache  [Sher  1983].  Neither  of  these  implementations  incorporates  the 
hierarchical  structure  discussed  here.  We  have  plans  to  extend  the  software 
simulation  to  hierarchical  flushing  algorithms.  The  relation  of  the  complex  flushing 
algorithms  to  hardware  is  under  study. 

The  next  step  is  to  simulate  this  hierarchical  cache  (initially  only  with  U  cells). 
Methods  for  vote  filtering  should  be  developed  and  tested.  Static  and  dynamic 
flushing  should  be  tried  with  various  scanning  strategies.  If  hierarchical  caches 
perform  significantly  better  than  single-resolution  caches,  we  must  investigate  the 
interaction  of  hierarchical  structure  with  hardware  caches  under  development. 
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